Restructuring Arrays for E cient Parallel Loop Execution
نویسنده
چکیده
In a sequential program, data are often structured in a way that is optimized for a sequential execution. However, when the program is parallelized, the data access pattern may change drastically. If the structure of the data is not changed accordingly, parallel performance will su er. In this paper, we consider this problem in the context of runtime loop parallelization [8, 9], which is a general technique to parallelize loops not amenable to compile-time analysis. In a parallel execution of a loop, iterations may be performed in a very di erent order than in the sequential execution. This may result in undesirable cache e ects on distributed shared-memory multiprocessors, unless the structure of the arrays accessed by these iterations is changed accordingly. We discuss what these problems are and how they arise. We then describe two data restructuring techniques to address them: the restructuring of read-write arrays to reduce inter-processor communication due to false sharing, and the restructuring of read-only arrays to improve spatial locality. We also report experiments on a KSR1 [3] to evaluate the e ectiveness of these techniques and the preprocessing and postprocessing overheads they entail. The results show that the restructuring techniques can substantially improve performance of the parallelized loop. When restructuring overheads are ignored, we see a doubling of parallel speedups. While restructuring overheads can be quite signi cant, they can often be amortized across multiple loop executions so that they do not outweigh the performance bene ts. In our experiments, it takes only two loop executions to achieve this.
منابع مشابه
Restructuring Arrays for E cient Parallel Loop
In a sequential program, data are often structured in a way that is optimized for a sequential execution. However, when the program is parallelized, the data access pattern may change drastically. If the structure of the data is not changed accordingly, parallel performance will su er. In this paper, we consider this problem in the context of runtime loop parallelization [8, 9], which is a gene...
متن کاملExtending Gg Odel for Expressing Restricted Quantiications and Arrays
The expressiveness of the declarative language G odel can be improved by adding to it bounded quanti cations, i.e., quanti cations over nite domains, and arrays. Many problems can be expressed more concisely using bounded quanti cations than using recursion. Arrays are natural for many applications, e.g., in scienti c computing, and are conveniently used in bounded quanti cations. Treating bou...
متن کاملGAPS: Genetic Algorithm Optimised Parallelisation
The compilation of FORTRAN programs for SPMD execution on parallel architectures often requires the application of program restructuring transformations such as loop interchange, loop distribution, loop fusion, loop skewing and statement reordering. Determining the optimal transformation sequence that minimises execution time for a given program is an NP-complete problem. The hypothesis of the ...
متن کاملLoop and Data Transformations : A
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve execution eeciency on parallel machines. This restructuring involves the transformation and partitioning of loop structures and data so as to improve parallelism, static and dynamic locality, and load balance. We present previous and ongoing work on loop and data transformations and motivate a u...
متن کاملLoop and Data Transformations: A Tutorial
In this tutorial, we address the problem of restructuring a (possibly sequential) program to improve execution eeciency on parallel machines. This restructuring involves the transformation and partitioning of loop structures and data so as to improve parallelism, static and dynamic locality, and load balance. We present previous and ongoing work on loop and data transformations and motivate a u...
متن کامل